Introduction to Git & GitHub

NYC Data Science Academy

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

Pre-requisites

  • Familiar with command line interface (CLI) and file systems.
  • A plain text editor for editing plain text files (UTF-8 encoded)
    • Sublime Text
    • Textwrangler (Mac only)
    • Vim
    • atom
    • Emacs
    • Notepad++ (Windows only)
  • Git installed on your machine, a GitHub.com account and Internet access.

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

What is Version Control

  • A Version Control System (VCS) category of software tools aiming to help programmers manage changes to source code over time.
  • VCS keeps track of modifications to the code in a special kind of database.
  • It allows developers to:
    • revert files or the entire project back to a previous state,
    • compare changes over time,
    • track every individual change by each contributor and help prevent concurrent work from conflicting,
    • and more.

Distributed Version Control Systems

  • Every client fully mirror the repository and has a full backup of all the data.
  • If server dies, the client repositories can be copied to the server to restore it.

There are many different DVCS, but here we are going to focus on just one, Git.

What is Git?

Git is a free and open source Distributed Version Control System (DVCS) originally developed in 2005 by Linus Torvalds, the famous creator of the Linux operating system kernel.

By far, Git is the most widely used modern version control system in the world:

  • Performance: Committing new changes, branching, merging and comparing past versions are all optimized for performance.
  • Security: Git has been designed with the integrity of managed source code as a top priority.
  • Flexiblity: Git has been designed to support branching and tagging as first-class citizens.

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

Install Git on Mac OS X

There are several ways to install Git on a Mac:

  • Apple maintain and ship their own fork of Git. You can install the Xcode or it’s Command Line Tools. If you don’t have it installed already, it will prompt you to install it.

  • If you want a more up to date version, you can also install it via a binary installer. An OSX Git installer is maintained and available for download at the Git website, at http://git-scm.com/download/mac.

When finish, open a terminal and verify the installation was successful by typing git --version:

$ git --version
git version 2.11.0 (Apple Git-81)

Install Git on Windows

  1. Download the Git for Windows installer from https://git-for-windows.github.io
  2. Execute the Git installer and follow the prompts on the Git Setup wizard to complete the installation. (You may choose to accept all the default settings)
  3. When finish, open the CMD or the Git Bash from the start menu and verify the installation by typing git --version:
$ git --version
git version 2.12.2.windows.1

Install Git on Linux

  • From your Terminal, install Git using apt-get (Debian/Ubuntu only):
$ sudo apt-get update
$ sudo apt-get install git
  • Verify the installation was successful by typing git --version:
$ git --version
git version 2.7.4

Confgure your Git Environment

Before you can use Git, you need to configure your Git username and email using the following commands:

$ git config --global user.name "Your Name"
$ git config --global user.email you@somewhere.com

Git stores these details in the ~/.gitconfig file and associate them with any commits that you create.

You can check your Git global settings by typing the following command:

$ git config --list --global
user.name=Your Name
user.email=you@somewhere.com
... ...

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

Setting up a repository

You can create a Git repository in two different ways:

  • The git init command creates a new Git repository. It can be used to convert an existing, unversioned project to a Git repository or initialize a new empty repository.

  • The git clone command copies an existing Git repository.

We will first give an example of creating a local repository using git init. The GitHub example clones a remote repository from Github using git clone.

Setting up a repository

Create project directory

When creating a new project on your local machine, you first need to create a new Git repository (or repo for short). A Git repository contains the history of a collection of files starting from a certain directory.

Let’s create a new directory, ~/git_proj/project_1 for our first git project using terminal.

~$ mkdir -p ~/git_proj/project_1
~$ tree ~/git_proj/
git_proj/
└── project_1

1 directory, 0 files

Note: to install the tree command on Mac OSX, run: brew install tree

Setting up a repository

git init

To initialize a Git repository in a project directory, cd into that directory and type git init:

~$ cd git_proj/project_1/
project_1/$ git init
Initialized empty Git repository in ~/git_proj/project_1/.git/

Running the command above will create a subdirectory called .git. Aside from the .git directory, an existing project remains unaltered.

project_1$ ls -a
.  ..  .git

Note: to remove the git repository, simply delete the .git subdirectory.

Recording Changes

Add a new file to the repository

To make it a little fun, we will add a simple python script to the project directory.

Now create a file called draw.py under the directory project_1 and add the following python code in it using any plain text editor you like.

import turtle 

painter = turtle.Turtle()

painter.forward(100)
painter.left(90)

turtle.done()

Recording Changes

Add a new file to the repository

When this is done, you should have:

project_1$ ls -a
.  ..  .git  draw.py

Let’s run the python script before moving to the next step:

project_1$ python draw.py # skip if you haven't installed python yet

Recording Changes

git status

Once you’ve added or modified files in a directory containing a git repo, git will notice the changes, but won’t keep track of the file unless you explicitly tell it to.

We can use git status command to display such information:

project_1$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    draw.py

nothing added to commit but untracked files present (use "git add" to track)

Recording Changes

The Three States

One of the most confusing parts when you’re first learning git is the concept of the staging environment and how it relates to a commit.

A Git project contains three main sections:

  • The working directory is a single checkout of one version of the project.
  • The staging area (also referred as index) is a file contained in your Git repository, that stores information about what will go into your next commit.
  • The repository is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

Recording Changes

Basic Git workflow

The basic Git workflow goes like this:

  1. You create or modify files in the working directory.
  2. You stage the files, adding snapshots of them to the staging area.
  3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to the repository.

Recording changes

git add

Now let’s add the file to the staging area using git add and then check the status:

project_1$ git add draw.py
project_1$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   draw.py

The file now has been added to to the staging area and it’s time to create our first commit!

Recording changes

Warning in Windows

For Windows, adding a file to the staging area might output the following warning:

project_1$ git add draw.py
Warning: LF will be replaced by CRLF in draw.py
The file will have its original line endings in your working directory.

If you want to turn this warning off, type the following command:

project_1$ git config core.autocrlf false

Recording changes

git commit

Now let’s commit the staged snapshot:

  1. run git commit, a text editor will be lauched prompting you for a message.
  2. Leave something meaningful (e.g.: add file draw.py) and save it.
project_1$ git commit
[master (root-commit) a4e1f62] add file draw.py
 1 file changed, 8 insertions(+)
 create mode 100644 draw.py

Note: You can do it in one single command with -m argument:

project_1$ git commit -m "add file draw.py"

Recording changes

After Commit

After you committed all your changes, the working directory, the staging area and the most recent commmit in the repository should all have the same copy.

Run git status again and see what happens:

project_1$ git status
On branch master
nothing to commit, working tree clean

To review the changes you have just made in the commit, try:

project_1$ git show

Recording changes

Make changes

Now repeat the lines of code of drawing 4 time with a for loop. Be careful with the indentation, Python uses indentation to determine the scopes.

When you finish, your script should look as follows:

import turtle

painter = turtle.Turtle()

for _ in range(4):
    painter.forward(100)
    painter.left(90)

turtle.done()

Recording changes

Make changes

If you now check the status, you should see a slightly different message:

project_1$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   draw.py

no changes added to commit (use "git add" and/or "git commit -a")

Recording changes

Make changes

Make sure your script generates the following output, then commit the changes.

Note: you can do add and commit in one command with the -am argument:

project_1$ git commit -am "repeat 4 time with a for loop"
[master 3498084] repeat 4 time with a for loop
 1 file changed, 3 insertions(+), 2 deletions(-)

Exercise 1

  1. Create a git repository and make your first commit by following the slides if you haven’t. Always use git status to check the changes after each step. Here are the key steps:
    1. Initialize the repository: git init
    2. Add a file to the working directory.
    3. Add snapshot to the staging area: git add <file>
    4. Commit the staged snapshot: git commit -m <message>
  2. Create a new python script called triangle.py to draw a triangle (or anything you like) and save it to your working directory. A sample script can be found in the next slide.

Exercise 1 (cont.)

Sample script

import turtle

painter = turtle.Turtle()

for _ in range(3):
    painter.forward(100)
    painter.left(120)

turtle.done()
  1. Add your change to the staging area and then make a commit.

Note: A convenient way of make multiple changes all at once is to use arguments such as -A, --all or .:

project_1$ git add .

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

Viewing the Changes

Sometimes you want to know exactly what you changed, not just which files were changed. To be specific, you may want to know:

  1. What have you changed but not yet staged?
  2. What have you staged that you are about to commit?

Let’s modify the script draw.py as follows (and run it when you finish):

import turtle

painter = turtle.Turtle()

for _ in range(18):
    painter.forward(100)
    painter.left(100)

turtle.done()

Viewing the Changes

git diff

To see what you’ve changed but not yet staged (i.e., working directory vs. staging area), type git diff with no other arguments:

project_1$ git diff
diff --git a/draw.py b/draw.py
index 80e6db4..c334910 100644
--- a/draw.py
+++ b/draw.py
@@ -2,8 +2,8 @@
-for _ in range(4):
+for _ in range(18):
     painter.forward(100)
-    painter.left(90)
+    painter.left(100)

Viewing the Changes

git diff

If you want to see what you’ve staged that will go into your next commit, you can use git diff --staged (or --cached):

project_1$ git add .
project_1$ git diff --staged
diff --git a/draw.py b/draw.py
index 80e6db4..c334910 100644
--- a/draw.py
+++ b/draw.py
@@ -2,8 +2,8 @@
-for _ in range(4):
+for _ in range(18):
     painter.forward(100)
-    painter.left(90)
+    painter.left(100)

Commit your changes after seeing the difference.

Viewing the Commit History

git status vs. git log

While git status lets you inspect the working directory and the staging area, git log operates on the committed history by showing committed snapshots.

Viewing the Commit History

git log

Now run git log and you should see something like this:

project_1$ git log
commit 8184ec25dfd8daffd3e7d1f7b63043aa70a63c17
Author: Your Name <you@somewhere.com>
Date:   Mon Apr 3 09:55:15 2017 -0400

    modify draw.py to draw a flower

commit 2341f5825a3dca4c7fde3d2cfa9c96267c589c3e
Author: Your Name <you@somewhere.com>
Date:   Mon Apr 3 09:33:45 2017 -0400

    add triangle.py script

commit 34980840ef174bf1417112d34464ea8d8770c619
Author: Your Name <you@somewhere.com>
Date:   Mon Apr 3 08:50:38 2017 -0400

    repeat 4 time with a for loop

commit a4e1f628b90a54cc5c6c9f96207371d0bd92f555
Author: Your Name <you@somewhere.com>
Date:   Sun Apr 2 23:18:29 2017 -0400

    add file draw.py

Viewing the Commit History

commit logs

By default, with no arguments, the git log command:

  • shows the commits made in that repository in reverse chronological order
  • lists each commit with its commit ID (SHA-1 checksum), the author’s name and email, the date written, and the commit message. For example, the most recent commit we’ve made so far has:
    • commit ID: 8184ec25dfd8daffd3e7d1f7b63043aa70a63c17
    • Author: Your Name <you@somewhere.com>
    • Date: Mon Apr 3 09:55:15 2017 -0400
    • message: modify draw.py to draw a flower

Viewing the Commit History

git log

We can add arguments after git log to change it’s behavors. For example:

  • To include a brief summary of the changes introduced by each commit:

    project_1$ git log --stat
  • To condense each commit to a single line:

    project_1$ git log --oneline
  • To include branches and the ref names:

    project_1$ git log --graph --decorate

For more examples, please check this page.

Refs and the Reflog

The Reflog

The Reflog records almost everything you’ve done in your local repo.

To view the reflog, run the git reflog command.

project_1$ git reflog
8184ec2 HEAD@{0}: commit: modify draw.py to draw a flower
2341f58 HEAD@{1}: commit: add triangle.py script
3498084 HEAD@{2}: commit: repeat 4 time with a for loop
a4e1f62 HEAD@{3}: reset: moving to a4e1f62
3f7d883 HEAD@{4}: commit: change line size
fa1d9bb HEAD@{5}: commit: change to for loop
a4e1f62 HEAD@{6}: commit (initial): add file draw.py

Note: if you compare the reflog message with the log message, you may aware that there are 3 commits only available in the reflog message. That’s because some commands like reset can delete the log but not the reflog.

Refs and the Reflog

Hashes / HEAD

Git is all about commits: you stage commits, create commits, view old commits, and transfer commits between repositories using many different Git commands.

Two of the frequently used ways to reference a commit are:

  • Hashes: the SHA-1 hash acts as the unique ID for each commit.
  • HEAD: the reference to the currently checked-out commit/branch.

For example, to show the log message and textual diff of the most recent commit we can do:

project_1$ git show 8184ec2
project_1$ git show HEAD

Refs and the Reflog

Hashes / HEAD

You can also refer to commits relative to another commit. The ~ character lets you reach parent commits.

For example, the following two commands show the log message and textual diff of the grandparent of the most recent commit:

project_1$ git show 3498084
project_1$ git show HEAD~2

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

Undoing Changes

So far we have basically covered how to record changes over time with Git. Now we will be focusing on how to recall specific versions.

First of all, let’s modify draw.py again. You can choose to add the following line above the for loop in your script, but literally any changes are acceptable since we will revert to an earlier version. Commit your change when you’re done.

painter.pensize(20)

If this is the only change you made, then you should get an output as follows:

Undoing Changes

git checkout

At the moment, our commit history looks something like this:

project_1$ git log --oneline
02391e9 make the pen thicker
8184ec2 modify draw.py to draw a flower
2341f58 add triangle.py script
3498084 repeat 4 time with a for loop
a4e1f62 add file draw.py

To check an earlier version, we can use git checkout followed by a commit, which could be either a hash (8184ec2) or a relative ref (HEAD~1):

project_1$ git checkout 8184ec2
project_1$ git status
HEAD detached at 8184ec2
nothing to commit, working tree clean

Undoing Changes

git checkout

The git checkout command makes the entire working directory match that commit, which allows you to view an old state of your project without altering your current state in any way.

To continue developing, you need to get back to the current state of the project:

git checkout master

Undoing Changes

git revert

Now assume we decide to undo the latest committed snapshot, which is the HEAD. We can use the git revert command:

project_1$ git revert HEAD
project_1$ git log --oneline
30e4617 Revert "make the pen thicker"
02391e9 make the pen thicker
8184ec2 modify draw.py to draw a flower
2341f58 add triangle.py script
3498084 repeat 4 time with a for loop
a4e1f62 add file draw.py

Note that the 5th commit is still in the project history after the revert. Instead of deleting it, git revert added a new commit to undo its changes.

Undoing Changes

git revert

Instead of removing the commit from the project history, git revert undo the changes by appending a new commit with the resulting content.

This prevents Git from losing history, which is important for the integrity of your revision history and for reliable collaboration.

Undoing Changes

git reset

The git reset command is used to permanently undo changes. It can be used to remove committed snapshots, undo changes in the staging area and the working directory. Once it has been done, there’s no way to retrieve. So be careful when you use it.

  • git reset remove the specified file from the staging area, but leave the working directory unchanged.
  • git reset --hard reset the staging area and the working directory to match the most recent commit.

Undoing Changes

git reset

Now let’s modifiy draw.py by changing painter.left(100) to painter.right(100) (or anything you like) and then add the snapshot to the staging area.

Executing the following two commands will first unstage the change and then undo the change from working directory:

project_1$ git reset
Unstaged changes after reset:
M   draw.py
project_1$ git reset --hard
HEAD is now at 30e4617 Revert "make the pen thicker"

Undoing Changes

git reset [--hard] <commit> works similar to git reset except it move the current branch tip backward to <commit>.

The figure below shows the difference between revert and reset:

Exercise 2

The git rm [file] command deletes the file from the working directory and stages the deletion.

  1. Delete the file triangle.py you just created in Exercise 1 by executing the following command:
project_1$ git rm triangle.py
  1. Get it back to your working directory (do not commit the staged snapshot).
  2. Delete the file again and commit your change this time.
  3. Use git revert to get it back to your working directory.

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

What is GitHub?

GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.

Create your github account at https://github.com if you haven’t already.

Creating a Repository on GitHub

To put your project up on GitHub, you’ll need to create a repository for it to live in.

You can store a variety of projects in GitHub repositories, including open source projects. With open source projects, you can share code to make better, more reliable software.

Now let’s create a GitHub Repo and then commit your first change by following the link https://help.github.com/articles/create-a-repo/.

Cloning a Repository from GitHub

Since now you have a repository called hello-world on GitHub, it exists as a remote repository. You can clone your repository to create a local copy on your computer and sync between the two locations.

  1. On GitHub, navigate to the main page of the repository.
  2. Under the repository name, click Clone or download.
  3. In the Clone with HTTPs section, click Copy to clipboard to copy the clone URL for the repository.

Cloning a Repository from GitHub

  1. Open Terminal and change the current working directory to ~/git_proj/ (assume this is the location where you want the cloned directory to be made).
  2. Type git clone, paste the URL you copied in Step 2, and then press Enter.

    git_proj$ git clone https://github.com/<YOUR-USERNAME>/hello-world.git
    Cloning into 'hello-world'...
    remote: Counting objects: 6, done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 6 (delta 0), reused 0 (delta 0), pack-reused 0
    Unpacking objects: 100% (6/6), done.

Your should now have your local clone created.

Outline

  1. What is Version Control and Git?
  2. Installing Git
  3. Getting Started with Git
  4. Git Tips
  5. Undoing Changes
  6. What is GitHub?
  7. Working With Remotes

Working with Remotes

Collaborating with others involves managing your GitHub remote repositories and pushing and pulling data to and from them when you need to share work. To be able to collaborate on any Git project, you need to know how to manage your remote repositories.

git remote

The git remote -v command shows you the URLs that Git has stored for the shortname to be used when reading and writing to that remote:

Now move into the project directory and check the URLs:

git_proj$ cd hello-world
hello-world$ git remote -v
origin  https://github.com/<YOUR-USERNAME>/hello-world.git (fetch)
origin  https://github.com/<YOUR-USERNAME>/hello-world.git (push)

Working with Remotes

git push

Let’s add the two scripts that you created in project_1 repository into hello-world repository and commit the changes.

Now the snapshot is commited to the local hello-world repository but not in your remote repository yet.

To send those changes to your remote repository, execute:

hello-world$ git push origin master
Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 448 bytes | 0 bytes/s, done.
Total 4 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), done.
To https://github.com/<YOUR-USERNAME>/hello-world.git
   8cc1a95..f124b4a  master -> master

Working with Remotes

git pull

Sometimes there might be some commits that are available on the remote repository but not on the local one and you want to fetch those remote changes.

Let’s first modify the README.md file on GitHub and commit. To update your local repository to the newest commit, execute:

hello-world$ git pull origin master
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/<YOUR-USERNAME>/hello-world
 * branch            master     -> FETCH_HEAD
   f124b4a..5047b5b  master     -> origin/master
Updating f124b4a..5047b5b
Fast-forward
 README.md | 2 ++
 1 file changed, 2 insertions(+)

Exercise 3

  1. Edit the triangle.py script in your local hello-world repo and README.md in your GitHub hello-world repo. Do not do any pull or push yet.
  2. Synchronize the two repositories using git pull / push after you commit all the changes.
  3. What happens if you do git push first?

Ignoring files, Branching and Git Cheat Sheet

For ignoring files in Git (.gitignore), please read this page:

For using branches, please read this page:

For Git Cheat Sheet, please check this page:

Exercise 1 Answers

Add a new file triangle.py to the project folder project_1. After this step you should get:

project_1$ ls
draw.py     triangle.py

Then do:

project_1$ git add .
project_1$ git commit -m "add triangle.py script"

Remember to check git status before and after each step, and git log after you commit the changes.

Exercise 2 Answers

  1. Delete triangle.py from working directory and staging area (but do not commit):
project_1$ git rm triangle.py
rm 'triangle.py'
project_1$ ls
draw.py
  1. Get it back to your working directory:
project_1$ git reset HEAD triangle.py
Unstaged changes after reset:
D   triangle.py
project_1$ git checkout -- triangle.py
project_1$ ls
draw.py     triangle.py

Exercise 2 Answers (cont.)

  1. Repeat 1 and then do:
project_1$ git commit -m "rm triangle.py"
[master 423e5ba] rm triangle.py
 1 file changed, 9 deletions(-)
 delete mode 100644 triangle.py
  1. Use git revert to get it back:
project_1$ git revert HEAD
[master 6a210f7] Revert "rm triangle.py"
 1 file changed, 9 insertions(+)
 create mode 100644 triangle.py
project_1$ ls
draw.py     triangle.py

Exercise 3 Answers

When you want to synchronize your local Git repo with the GitHub repo, Always follow the steps:

  1. git pull to download GitHub commits to local
  2. merge your local master with the GitHub origin if needed.
    Note: if you see conflicts, you have to manually fix it before you can merge. Please read this page https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/
  3. git push to upload local commits to GitHub